AITopics

Country:

North America > United States (0.45)
Europe (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.92)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsApr-29-2026, 10:08:28 GMT

adc98a266f45005c403b8311ca7e8bd7-Supplemental-Conference.pdf

artificial intelligence, machine learning, probability, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Neural Information Processing SystemsApr-24-2026, 16:50:13 GMT

Appendix

In this section we motivate the design choices and inductive biases that we encode into our neural encoder network e, which is the network that is used to model the relative accuracies of the weak supervision sources λ. Recall that we model the probability of a particular sample x X having the class label y Y = {1,...,C}as Pθ(y|λ) = softmax(s)yP(y), (4) s = θ(λ,x)Tλ RC . Connection to prior PGM models We now motivate this choice by deriving a less expressive variant of it from the standard Markov Random Field (MRF) used in the related work. If we view the attention scores θ(λ,x) Rm, that assign sample-dependent accuracies to each labeling function, as sample-independent parameters θ1 and, by that, drop the features from the equation - as is done in the related work [30, 32, 19, 11] - we can rewrite Eq. 4 as exp θT1 1 {λ = y} P We can recognize Pθ as a distribution from the exponential familiy, and more specifically as a pairwise MRF, or factor graph, with canonical parameters θ = (θ1,θ2) and corresponding sufficient statistics, or factors, φ(λ,y) = (φ1(λ,y),φ2(λ)), as well as the log partition function Zθ. The accuracy factors and parameters φ1,θ1 are the core component of this model and sometimes take the form φ1(λy) = λy in binary models as in [30, 19, 11]. The label-independent factors φ2(λ) have, as can be seen from the derivation above, no direct influence on the latent label posterior, but are often used to model labeling propensities 1 {λ 6= 0}and correlation dependencies 1 {λi = λj}, which can be important for PGM parameter learning, but are susceptible to misspecifications [39, 11, 8].

artificial intelligence, experiment, machine learning, (19 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.66)

Neural Information Processing SystemsFeb-7-2026, 12:07:14 GMT

Appendix A Posterior Reparameterization

In this section we motivate the design choices and inductive biases that we encode into our neural encoder network e, which is the network that is used to model the relative accuracies of the weak supervision sources λ. Recall that we model the probability of a particular sample x X having the class label y Y = {1,..., C} as P Our own parameterization therefore is a more expressive variant of these latent-variable PGM models, where we are able to assign LF accuracies on a sample-by-sample basis. Furthermore, our neural encoder network outputs them as a function of the LF outputs and features, and is expected to learn the easy to misspecify dependencies and label-independent statistics implicitly. The top 2 performance scores are highlighted as First, Second. Triplet-median [11] is not listed as it only converged for IMDB with 12 LFs (F1 = 73.0

artificial intelligence, experiment, machine learning, (17 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsNov-19-2025, 16:47:15 GMT

A Approximate Behavior of Metrics on Sequential Data

How do different metrics behave when used to measure autoregressive model outputs? A.1 Per-T oken Error Probability is Resolution-Limited Here, resolution refers to "the smallest interval measurable After F coin flips, we can only resolve the coin's probability of A.3), we ignore how likely the language model is to over-348 Section 3.2 of [23] gives the exact definition, but the Simulations show that as the per-token error probability slightly increase (e.g. from 0.05 to 0.1), the ROUGE-L-Sum metric sharply falls.Figure 10: Induced emergent MNIST classification ability in convolutional networks.

artificial intelligence, machine learning, probability, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Aymanns, Christoph, Foerster, Jakob, Georg, Co-Pierre, Weber, Matthias

Fake News in Social Networks

arXiv.org Artificial IntelligenceOct-14-2025

We propose multi-agent reinforcement learning as a new method for modeling fake news in social networks. This method allows us to model human behavior in social networks both in unaccustomed populations and in populations that have adapted to the presence of fake news. In particular the latter is challenging for existing methods. We find that a fake-news attack is more effective if it targets highly connected people and people with weaker private information. Attacks are more effective when the disinformation is spread across several agents than when the disinformation is concentrated with more intensity on fewer agents. Furthermore, fake news spread less well in balanced networks than in clustered networks. We test a part of our findings in a human-subject experiment. The experimental evidence provides support for the predictions from the model, suggesting that the model is suitable to analyze the spread of fake news in social networks.

experiment, machine learning, reinforcement learning, (18 more...)

1708.06233

Country: Europe (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > News (1.00)
Information Technology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

arXiv.org Artificial IntelligenceJul-29-2025

How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation

Yang, Hao, Zhao, Qinghua, Li, Lei

Chain-of-Thought (CoT) prompting significantly enhances model reasoning, yet its internal mechanisms remain poorly understood. We analyze CoT's operational principles by reversely tracing information flow across decoding, projection, and activation phases. Our quantitative analysis suggests that CoT may serve as a decoding space pruner, leveraging answer templates to guide output generation, with higher template adherence strongly correlating with improved performance. Furthermore, we surprisingly find that CoT modulates neuron engagement in a task-dependent manner: reducing neuron activation in open-domain tasks, yet increasing it in closed-domain scenarios. These findings offer a novel mechanistic interpretability framework and critical insights for enabling targeted CoT interventions to design more efficient and robust prompts. We released our code and data at https://anonymous.4open.science/r/cot-D247.

large language model, machine learning, natural language, (19 more...)

2507.20758

Country:

Asia (0.69)
North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Football (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Here's how to generate a truly random number with quantum physics

Breakthroughs, discoveries, and DIY tips sent every weekday. Very little in this life is truly random. A coin flip is influenced by the flipper's force, its surrounding airflow, and gravity. Similar variables dictate rolling a pair of dice or shuffling a deck of cards, while even classical computing's cryptographic algorithms are theoretically susceptible to outside influence or bias. "True randomness is something that nothing in the universe can predict in advance," explained Krister Shalm, a physicist at the National Institute of Standards and Technology (NIST).

artificial intelligence, random number, randomness, (14 more...)

Popular Science

Country: North America > United States > Colorado > Boulder County > Boulder (0.05)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.36)

arXiv.org Artificial IntelligenceMar-6-2025

Enough Coin Flips Can Make LLMs Act Bayesian

Gupta, Ritwik, Corona, Rodolfo, Ge, Jiaxin, Wang, Eric, Klein, Dan, Darrell, Trevor, Chan, David M.

Large language models (LLMs) exhibit the ability to generalize given few-shot examples in their input prompt, an emergent capability known as in-context learning (ICL). We investigate whether LLMs utilize ICL to perform structured reasoning in ways that are consistent with a Bayesian framework or rely on pattern matching. Using a controlled setting of biased coin flips, we find that: (1) LLMs often possess biased priors, causing initial divergence in zero-shot settings, (2) in-context evidence outweighs explicit bias instructions, (3) LLMs broadly follow Bayesian posterior updates, with deviations primarily due to miscalibrated priors rather than flawed updates, and (4) attention magnitude has negligible effect on Bayesian inference. With sufficient demonstrations of biased coin flips via ICL, LLMs update their priors in a Bayesian manner.

bias percentage, icl length 10, language model, (12 more...)

2503.04722

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.46)

arXiv.org Artificial IntelligenceDec-5-2024

Chain-of-Thought in Large Language Models: Decoding, Projection, and Activation

Yang, Hao, Zhao, Qianghua, Li, Lei

Chain-of-Thought prompting has significantly enhanced the reasoning capabilities of large language models, with numerous studies exploring factors influencing its performance. However, the underlying mechanisms remain poorly understood. To further demystify the operational principles, this work examines three key aspects: decoding, projection, and activation, aiming to elucidate the changes that occur within models when employing Chainof-Thought. Our findings reveal that LLMs effectively imitate exemplar formats while integrating them with their understanding of the question, exhibiting fluctuations in token logits during generation but ultimately producing a more concentrated logits distribution, and activating a broader set of neurons in the final layers, indicating more extensive knowledge retrieval compared to standard prompts. Our code and data will be publicly avialable when the paper is accepted.

cot, dataset, probability, (13 more...)